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Abstract 

The steep rise in availability and usage of high-throughput technologies in biology brought with it a clear 
need for methods to control the False Discovery Rate (FDR) in multiple tests. Benjamini and Hochberg 
(BH) introduced in 1995 a simple procedure and proved that it provided a bound on the expected value, 
FDR < q. Since then, many authors tried to improve the BH bound, with one approach being designing 
adaptive procedures, which aim at estimating the number of true null hypothesis in order to get a better 
FDR bound. Our two main rigorous results are: (i) a theorem that provides a bound on the FDR for 
adaptive procedures that use any estimator for the number of true hypotheses (mo), (ii) a theorem that 
proves a monotonicity property of general BH-like procedures, both for the case where the hypothesis 
are independent. We also propose two improved procedures for which we prove FDR control for the 
independent case, and demonstrate their advantages over several available bounds, on simulated data and 
on a large number of gene expression datasets. Both applications are simple and involve a similar amount of 
computation as the original BH procedure. We compare the performance of our proposed procedures with 
BH and other procedures and find that in most cases we get more power for the same level of statistical 
significance. 

Availability: A MATLAB code is available at: 
|http: //www .broad .mit . edu/'^orzuk/matlab/libs/stats/f dr/matlab_f dr_utils .html! 

Keywords: False Discovery Rate; Improved BH; Monotonicity; Gene expression analysis. 

1 Introduction 

The main goal of statistical comparisons (tests) is to calculate the level of statistical significance at which a 
given null hypothesis is rejected on the basis of available data. Researchers use this tool in order to present 
their findings and support their conclusions. Uncontrolled application of single inference procedures in a 
multiple comparison setting can cause a high false positive rate. Special multiple comparison procedures 
are used in order to control the probability of committing such a type I error in families of comparisons. 
The need for improved control over the multiplicity effect in biological experiments became acute in the 
nineties, when the amount of data that could be measured and stored increased thousands fold. Many 
new experimental techniques, which allowed taking a large number of measurements simultaneously were 
developed, along with improved data acquisition and storage capabilities. 

For example, in the case of gene expression microarray measurements, a typical aim is to identify the genes 
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whose expression levels differentiate between healthy (type A) and diseased (type B) subjects. Genes are 
tested one by one for differential expression; the formal way to do this is by posing several thousand null 
hypotheses. A null hypothesis states that a particular variable (e.g. expression level of gene i) is sampled 
from the same distribution for both types A,B; one is interested in identifying variables (genes) for which 
the null hypothesis is rejected (i.e. genes whose expression does differentiate between types A,B). Such a 
finding is referred to as a discovery. Denote by m the total number of hypotheses (e.g. the number of genes 
whose expression levels were measured), and assume that the null hypothesis is true for mo out of the m 
(i.e. mo genes' expression levels do not differentiate the two types). For mi = m — mo the null hypothesis 
is false (the expression levels of types A and B are sampled from different distributions) . A statistical test 
is performed independently for each variable, producing a p-value pi, i = 1,2, ...m. On the basis of some 
thresholding operation on the PiS, the null hypothesis is rejected for R tests. The decision to reject (or 
not) can be correct or false; When the null hypothesis is rejected for one of the mo variables for which it is 
actually true, we have a "false discovery" (type I error). Table [T] presents the possible categories to which 
rejected and non-rejected hypotheses can belong, and the number of hypotheses in each category. 



" ground truth" 
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Table 1: Numbers of true/false decisions taken when testing m null hypotheses 



Out of the R rejected hypotheses the fraction V/R is falsely rejected. The expected value of this fraction 
was termed by (referred to as BH95) as the False Discovery Rate (FDR), 



FDR = E 



R>Q] Pr{R >0) = E(^] (1) 



where here and later in the paper the term R^ = max(i?, 1) is used for brevity. It is required since V/R 
is undefined when R = and thus this case should be treated separately - we follow j^] and replace V/R 
by in this case. 

The original BH95 procedure to control the FDR is given as follows; 

1. Denote by q the desired level, < g < 1, of the FDR and define the following set of constants: 

ai = — , i = l,2,...,m (2) 
m 

2. Sort the p-values pi and re-label the hypotheses accordingly, < p(2) < .... < P(m), such that (i) is 
the index of the hypothesis with the i-th smallest p-value. 

3. Identify R as 

R = max {i : p(^i) < Qi} (3) 
If no such _R > 1 exists, no hypothesis is rejected; otherwise reject all R hypotheses (i) = 1, 2, ...R. 

This procedure has a simple graphical implementation, depicted in Fig. [T] It is referred to in BH95 as 
"step-up"; note that in general there could be more than one intersection point (of the and lines). 
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in which case the step-up procedure identifies the intersection with the largest p-value as R, whereas the 
more conservative "step-down" procedure identifies the lowest one, replacing eq. ((Sjl by 

R = min {i : > Ui] ~ 1 (4) 




Figure 1: Typical examples for the use of the BH95 and our IBH procedures, for a desired FDR value of 
g = 0.1. The sorted p- values (solid line), the of eq. ^ (dashed line) and the 7; from eq. (dot-dashed 
line for IBHsum and solid light for IBHlog) are shown, for (a) leukemia data from 1] and (b) breast cancer 
data from 18]. As indicated in (a), the number of rejections is determined for each procedure by locating the 
(maximal) value i = i? at which the corresponding lines intersect p(j') , (the vertical lines mark the intersection 
point between the lines). 

The bound 

was proved by BH95 for independent tests, and by ,5] for a certain type of 'positive dependency' called 
PRDS [Positive Regression Dependency on each one from a Subset). The value of mo is unknown to the 
researcher, but since mo < m, this procedure leads to the bound: 

Clearly, had we known mo, we could have defined a different set of constants (compare to eq. ((^J) 

mo 

and defining 

R' — max {z : < a^} (8) 

would have obtained a larger number R' > Rof rejected hypotheses (still with FDR < q) , than the number 
R given by the original BH95 procedure, which used m as an upper-bound on mo. This procedure, based 
on knowledge of mo, is called "oracle" (ORG), see 
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Subsequently various improved (also called 'adaptive') procedures were proposed, based on the idea of 
estimating the unknown mo in order to get a more accurate handle on the FDR. These procedures can be 
divided into two major classes: 

1. Procedures for FDR estimation: This approach, pioneered by [3], can be used when one has an 
estimator mo of mo, that satisfies: 



One keeps the number of rejections R fixed, according to the researcher's desire, and computes the 
q- value, q = q{R). This value is set to be the smallest number such that at least R rejected hypotheses 
are found according to the BH95 procedure with parameter q (see Def. [T] eqs. (|2l3p l. Replacing mo 
by its estimator and using eqs. © and © we get for the same number of rejected hypotheses R as 
BH95, an improved, smaller estimator for the FDR, q — ^q. The estimator gives a smaller bound 
on average on the FDR: 



with q' — E{q'). This approach is the preferred one in many biological contexts, when the investigator 
wishes to control R, the number of discoveries made (e.g. diff'erentiating genes to be used in further 
experiments). 

2. Procedures for FDR control: In this approach, one wishes to control the FDR at a preset level q. 
This is achieved by defining ji — iq/rho to be used in the same way as and a'^ (see eq. ([2| and eq. 
^ ), leading typically to a larger number R' of rejected hypotheses (compared to BH95), with the 
FDR still being bound by the desired value q. The advantage of this procedure (presented in Sec. 
[5]) is that one retains control of q, the desired level of FDR. 

We present in this paper two estimators, rho and mo, that satisfy eq. ®, and hence can be used 
trivially for FDR estimation. As opposed to FDR estimation, proving control of the FDR is far more 
involved, and constitutes a significant portion of this paper. We provide two new proven procedures for 
control of the FDR. We first prove control for these procedures when employed in a step-up manner. 
Then, by using a new general monotonicity result for the FDR which we derive, we show that the step- 
down versions of our procedures also control the FDR. Designing better procedures for FDR estimation 
and control has drawn a great deal of attention in recent years, as is demonstrated by the abundance of 
proposed procedures and many theoretical and experimental papers. However, as far as we know only for a 
few such procedures has control of the FDR been rigorously established: the original BH95 procedure Q|, 
the two- stage and multiple-stage adaptive BH procedures [ff] (we refer to the latter as BKY), and Storey's 



procedure [23] (referred to as STS). All these procedures (except, of course, BH95) claim to give improved 
power over BH[95. All are derived from a better estimation of mo. Almost all proofs for FDR control 
assume independence of the p- values (with the notable exception Q]). Thus, far less is known about the 
behavior of FDR procedures under dependency, where most of our understanding comes from simulation 
studies. In addition, the FDR, by its definition (eq. Q), is an expected value. However, the fraction of 
the false discoveries V/R'^ is a random variable. While the mean value (FDR) was extensively studied, 
far less attention has been devoted in the literature to the behavior of this random variable, its variance 
and entire distribution. We therefore perform simulations whose purposes are: a. To study the behavior 
of the various procedures under dependence, where analytical results are harder to establish, and b. study 
the distribution of the fraction of false rejections {V/ R^), which has implications on possible violation of 



mo < E(rho) < m 



(9) 




(10) 
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the bound for a particular realization. Our simulations provide a comparison of our new procedures to the 
known ones mentioned above and we show that our new procedures compare favorably in most cases of 
interest. We analyze simulated and real data, and show that for both the new procedures almost always 
reject more hypotheses than BH95, while maintaining control even under dependence, and we therefore 
refer to these procedures as 'Improved BH' (IBH). The real data which we use is gene expression data 
obtained from various cancer studies, and show that our new procedures allow rejection of more hypotheses 
at a given confidence level and thus increase discovery power. 

2 Preliminaries and theorem on control 

In this section we present a theorem which provides a general way to build an improved bound for controlling 
the FDR using an estimator for mo. Two examples of practical implementation of the theorem lead to 
useful procedures described in the next section. The working assumptions we use here is that the p-values 
are independent. The theorem is not proven for dependent variables but our simulations indicate that in 
most cases we do control the FDR even under dependence (see Sec. [Sj. 

Our first step is defining mathematically a family of estimators mo for mo. We define a general modified 
BH procedure, in which any one of these estimators is used by replacing m in the original BH95 procedure 
(see eqs. (|2l3p ) by mo. Throughout this section and the rest of the paper we denote for convenience 
Pi,,j = Pi, ■■,Pj. We also denote p = (pi, ..jPm) the vector of all p-values. 

Definition 1 An estimator for mo is a family of functions mo = rrig'"' : [0, 1]"" ^ R, mo = mo(p). We 
usually omit the index '™'' as it is obvious from the context. We say that mo is a monotonic estimator if 
it satisfies: 

1. m^™'(pi, > m^'"^(pi, ^Pi>p'i, 1 = 1,2, ,,m, m>l 

2. ml^\px,..,p^,..,p„C)>m'l^~'^\px,..,p^-l,p^+^_,..,prn), Vi = l,2, ,,m, m>2 

Definition 2 Assume w. I. o.g that we have m hypotheses the first mo of which are null. Let p — {pi, ..,pm) 
be the corresponding p-values. The modified step-up BH procedure with estimator mo is defined as follows: 

1. Compute mo = mo{p). 

2. For each i define: 

mo 

3. Order the p-values in an increasing order: < .. < P(m)- 

4. Let R = max{j : pj^j < 7^}, and reject the hypotheses (1), (2), ...{R) (If no such R exists, don't reject 
any hypothesis). 

Note the similarity of the procedure to the original BH95 procedure, with the addition initial step of esti- 
mating mo, and the different set of constants used to determine R. The modified step-down BH procedure 
is defined in the same way, except that in step 4 we take R — min {i : > Oi} — 1. 

The next theorem gives the bound on the FDR for the above procedure under the above assumptions 
(a very similar result was given by Q]): 

Theorem 1 Let mo = mo {pj be a monotonic estimator for mo . Consider the modified step-up BH proce- 
dure defined above. Let mQ^{p) = mo(p2,..,Pm) be the same estimator, but disregarding the first p-value 



5 



Pi. Assume that the null p-values are i.i.d. (7[0, 1]. Then the procedure satisfies: 



FDR = E 



V 



R+ 



< TUoqE 



(12) 



where p\ is a representative to one of the p-values from the true null p-values. 

The proof of Thm. [T]is given in Appendix El for completeness. In general, a direct computation, or 
bounding of the FDR for a given procedure is a demanding task, which depends heavily on the procedure's 
details, and suffers from complicated dependence on the rejection of different hypotheses, reflected in the 
computation of E\V/R'^] (this is true even if the p-values themselves are independent) and therefore there 
is no general way to prove FDR controlling properties of various procedures. The advantage of Thm. 1 is 
that it provides a direct method for proving control for a wide class of procedures, by simply bounding the 
reciprocal mean of the estimator for mo. In the next section we use this theorem to prove control of the 
FDR for two procedures, based on different estimators mo, and mo which we propose. We are not aware 
of a direct way for proving control of the FDR for these procedures, thus demonstrating the power and 
generality of the theorem. 



3 The proposed procedures 



In this section we propose two FDR controlling procedures. We show that they achieve direct control of 
q, the desired value of the FDR, while producing a list of R' discoveries satisfying almost always R' > R, 
the corresponding BH95 value. The procedures are particular cases of Def. [2] According to Thm. [1] 
any estimator that satisfies our monotonicity assumption bounds the FDR by FDR < moqE[l/ml^]. 
Therefore, in order to show that the FDR is controlled, it suffices to bound E[l/rhQ^]. In particular, if we 
want to achieve a certain FDR control level q, we need to verify that 



E 



m« 



mo 



Our first estimator is based on 



T-'o = 2^Pj 



(13) 



(14) 



mo was used by [19| for estimation but without proving control of the FDR. The second estimator is based 
on 



mo 



^, = -X^log(l 



(15) 



For both estimators we first show that eq. (O is satisfied and hence both can be used for FDR 
estimation. Next we describe the procedure to be used for control of the FDR, which is proved by showing, 
for slightly modified versions (see blow) of both estimators, rho and mo that the bound eq. (|13p is satisfied. 
Note that both mo,mo are monotonic estimators according to Def. [T] Our claims are: 

1. Both procedures control the FDR - for the list of R' discoveries we have FDR < q. 

2. In nearly all cases of interest the number of discoveries obtained by our procedures exceeds the 
number obtained (for the same value of q) by the BH95 procedure, i.e. R' > R. This holds since 
nearly always mo < m (exceptions occur when there are almost no false hypotheses, i.e. m and mo 
are very close). 
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We note that a reasonable requirement from an estimator for mo should be that it is (approximately) 
unbiased, at least when all hypotheses are null. Otherwise we will get a systematic over/under-estimation 
of mo and a corresponding over/under-estimation of the FDR. The two expressions used above indeed 
give unbiased estimators for mo. In order to show control over the FDR using Thm. [TJ we have to apply 
small corrections to both estimators, turning them into conservative estimators (i.e. over-estimating mo). 
This is due to two reasons; the first is that the bound on the FDR given in Thm. [T]uses mp'^ (rather 
than mo) and thus we 'lose' one of the p-values and need to correct for that. The second reason is that 
mQ*^ appears in the denominator, and its fluctuations have asymmetric infiuence on the FDR bound. This 
can be illustrated by using Jensen's inequality which gives _E[l/mQ'^] > 1/ E[rn"(p], thus showing that an 
unbiased estimator for mo will typically show a bias when its reciprocal is used. Nevertheless, we show 
that these two effects can be overcome by applying a small correction, which becomes negligible as the 
number of hypotheses go to infinity. 



Our first estimator is based on mo (see eq. (HH)) that was also used by 

[19! 

but only for estimation 

and not for control. Since for the mo variables for which the null hypothesis holds we have p*"^"*^ ~ 
[/[0, 1] => EIpY^"] = \, it is trivial to see that i5[rno] > mo. To show that -^[mo] < m, we have to 
make a further assumption regarding the alternative p-values p^i^^^": We denote the distribution of p^^^'^^ 
by ./'/"''"^, i.e. p^t^'"' ~ j/aiae ■!■£ ^j^g g^^g stochastically smaller Q] than the uniform distribution, 
^jfaise U\p, 1]), we have < ^ which immediately implies -E[mo] < m, ( a distribution density 

/ is said to be stochastically smaller than a distribution density g, f <st g, if F(x) = f(i)dt > G{x) = 
I-oo9it)dt Vxe (-(X),cx)) ;2] ). 

We introduce the following modified estimator: 



where C{m),s{m) are universal correction factors that ensure that the condition (|13p is satisfied (for details 
see AppendixlB} . The correction factors were computed numerically and are presented in Fig. [S] Note that 
when m — » oo, C 1 and s/m 0, and therefore the corrections become negligible and the estimator 
mo reduces to m-o. 



In this section we propose another estimator for mo, based on jtiq, (see eq. (|15|l ). Again, since for 
i — 1,2. ..mo we have pj^""" ~ f^[0, 1] E[—log{l — Pi)] — 1 and therefore i5[mo] > mo. Furthermore, 
if all the alternative p-values p^^^^'^ have a distribution which is stochastically smaller than the uniform 
distribution {flf'^Hp) <st f/[0, 1]), then E[-log{l ~ pf^"")] < 1, and therefore E[m'o] < m. 

The advantage of using the second estimator rfiQ over mo is that when flf'^ip) <st (7[0, 1], the 
alternative hypothesis generates p-values skewed to the left. Since — log(l — p) < 2p, Vp < | (see eqs. 
(|14p and (|15|) ). this typically implies jtiq < rho and thus mg is typically closer to the true mo. A possible 
drawback is that the variance of ttiq is typically larger than that of m-o, which might result in an instability 
in the estimation of mo. 

Proving control of the FDR for mo is difficult since we need to bound 1/mo which has a complicated 
distribution. Here we show that the distribution of ttiq is much simpler, and this enables us to prove control 
of the FDR by introducing only a slight additive correction. 



3.1 The IBHsum procedure 





(16) 



3.2 The IBHlog estimator 
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(a) 




Figure 2: The correction functions C{m) and s{m)/m (see eq. (jl6p ). As m ^ cx) the multiplicative correction 
C{m) approaches one, while the (normalized) threshold s{m)/m (used when tjiq < s(rn)) goes to zero, thus 
rho reduces to the un-corrected tji'q 

Claim 1 Define the (corrected) estimator: 

m 

mo = 2 + mo = 2-^log(l-p,) (17) 

j=i 

Assume that the null p-values are i.i.d C/[0, 1]. Then the modified BH procedure with estimator rho arid 
parameter q controls the FDR at level < q. 

The proof is achieved by bounding E[l/rhl^] and then using Thm. [l] See Appendix [Cl for full details 

4 Is the FDR monotonic? 

In this section we take a slight de-tour from the study of our proposed procedures to investigate the 
following question: is it generally true that by modifying an FDR procedure to be more stringent, one 
is guaranteed to obtain a more conservative control on the FDR? The motivation for dealing with this 
question in the context of the current paper (which deals with the control property of a modified BH 
procedure) comes from the fact that Thm. [T]was proved only for step-up procedures, which leads us to ask 
whether it holds also for the more conservative step-down case. Monotonicity is a natural property that one 
might expect when performing statistical tests, as it allows the researcher to choose a trade-off between 
maximizing the statistical power and minimizing the risk of making false discoveries. The analogous 
question for a single hypothesis is whether taking a more conservative (lower) p-value cutoff guarantees to 
reduce the risk of making a type-I error, and is trivially answered in the affirmative. Our formulation of 
the question in the multiple-hypothesis settings using FDR is as follows; Given two procedures, B^^\ S'^-* 
(possibly parameterized by q or other parameters), and assuming that for any realization of the p-values, 
B^^^ passes more hypotheses than B'^-*, is it true that FDR^^^ < FDR^^^l while this statement seems a 
natural and plausible property of FDR procedures, we are not aware of any previous treatment of it in the 
literature. Here we show that under certain monotonicity conditions on the alternative hypothesis p-values 
distribution, one can prove this monotonicity property of the FDR. 
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Theorem 2 Let p — {pi..m) be a set of independent p-values. Assume that f, the marginal probability 
density function of the alternatives, is monotonically non-increasing and differentiable. Let ij'*' be two 
threshold FDR procedures rejecting i?'*' (p) hypotheses and each having FDR^'\ i = 1,2. A ssume that for 
any q, R'^^\p) < R^^\p), Vp. Then it also holds that FDR^^^ < FDR^'^K 

The proof is given in Appendix [D] A particular application of the above theorem is showing that 
step-down procedures give better FDR then step-up procedures. Thus, we immediately get: 

Corollary 1 The statement of Thm, \^holds also for the step-down procedure, provided that the alternative 
f is monotonically decreasing. 

The above conditions for monotonicity might appear a bit restrictive, and one could could hope to 
relax them - for example require only / <st (7[0, 1] instead of monotonicity. We have found that, perhaps 
surprisingly, monotonicity of the FDR does not hold under such relaxed conditions, by giving an example 
in which FDR monotonicity is violated, even for a simples case of independent test statistics (both null 
and non-null), when / <st f/[0, 1], and when the FDR procedures themselves are monotonic. It is thus 
not obvious at all that in practice we will always observe a monotonic behavior of the FDR, and thus it is 
possible to get a higher FDR for a more conservative procedure. 

Example 1 Let m = 3 and mo = 1. Let the two alternative hypotheses p-values be pi ~ eC/[0, e] 4- 
(1 — e)5{pi — e) for some < e < 1. Thus, P2,Ps, a,re 'truncated' uniform r.v.s., having 1 — e of their 
mass concentrated at e, and the rest (e) uniformly distributed on [0,e]; their distributions are stochastically 
smaller than f/[0, 1]. For simplicity of computations we assume that e « 1 and thus look only at the 
first order in e, although the example's conclusion holds for any e > 0. Let P'"*^' be the procedure always 
rejecting the lowest p-value and p'^' be the procedure rejecting the two lowest p-values (we assume that ties 
are handled in the same way by both procedures, e.g. by taking p-values in lexicographic order - the precise 
tie-breaking rule does not change the example's results). We next compute the FDR for both procedures: 

FDR^^^ = Pr(pi < p2,ps) = e[(eV3 + 26(1 - e)/2 + (1 - ef] = e + O(e') (18) 

FDR^^^ = (1 - Pr(pi > P2,P3))/2 = [1 - (1 - e) - e^/3]/2 = e/2 + O(e') (19) 

Thus for e small enough FDR^^^ > FDR^'^^ and the more conservative procedure leads, in fact, to a 
higher FDR. 

5 Synthetic data obtained by simulations 

We applied our method, as well as several others (see below), to synthetic data obtained by simulations 
performed along the lines of with full details presented in Appendix [E] The advantage of working 
with synthetic data is that several parameters of interest are under full control, and one can investigate 
their effect on the quahty of different procedures and bounds. Furthermore, by performing repeated sim- 
ulations, one can determine not only the {expected value) FDR but also the entire distribution of V/R'^ . 
One should bear in mind that results based on specific simulations might have limited applicability and are 
hard to generalize, since the simulations use specific configurations (e.g. data distribution, test to deter- 
mine p-values, hypothesis dependency structure etc.). A comprehensive simulation capturing all possible 
behaviors of the hypothesis is infeasible, but we have tried to explore various different plausible scenarios 
which might be encountered in practice, by changing the number of (total and null) hypothesis and their 
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dependency structure, with both positive and negative correlations. The simulations produce two kinds 
of Gaussian random variables: Zi, ...Zmg , sampled from the standard normal distribution Pq = N{0,1), 
and Zmo+i, ---Zm, sampled from Pi = N{fii, 1), centered on fii > 0. All variables (both null and non-null) 
are sampled with covariance p, {0 < p < 1): at the extreme cases, setting p — corresponds to indepen- 
dent variables whereas p = 1 to full (deterministic) dependency. For each Zi the corresponding two-tailed 
p-value is obtained, pi = 2${~\Zi\), where "1> is the standard Gaussian cumulative distribution function. 
The obtained p^'s have a uniform U[0, 1] distribution for i — 1, .., mo (corresponding to the null hypothesis) 
and a distribution stochastically smaller than uniform for i — mo + 1, ..,m (the alternative hypothesis). 
A set of m such variables constitutes a single instance or realization of the data to be analyzed. To get 
accurate estimates of the FDR and the V/R^ distribution, we generated for each simulation 50000 such 
realizations, which generally gave highly accurate and reproducible estimates. Under the null hypotheses 
all variables are sampled from the first distribution, m p-values are calculated accordingly and used as 
input to one of the procedures with a desired FDR bound q, producing a list of R rejections. As opposed 
to real data, here one can go back and identify those V among the R that were falsely rejected (i.e. were, 
in fact, selected from Pq). This way one can keep track of the true values of V/R'^ , their mean (calculated 
over a large number of instances), variance, etc. 

One important goal of the simulation is comparing our procedures to existing ones. Specifically, we 
compare our procedure to: (1) the BH95 procedure as described in the introduction, (2) the BKY procedure 
which defines a local (i-dependent) estimator for mo, given by iho^^ — m+l — i{l — q), and uses it in the 
step down manner of the BH95 procedure with g* = qm/niQ^^ , (3) the STS procedure which introduces 
^STS _ ^^_|^ 1 — r(A))/(l — A) as the estimator for mo where r{\) — i^{pi < A}, and then uses the step-up 
BH95 procedure, with q* — gm/mo"^^, with the requirement that all the rejected Pi < \ (throughout this 
paper we used the STS procedure with A — 0.5, like the authors did in their paper). 

We present here two kinds of results derived from such simulations. First we compare the values of 
FDR = E{V/R+) obtained by the procedures discussed above: BH95, BKY, STS, IBHsum and IBHlog 
when the hypotheses are dependent. In particular, we demonstrate that for positive correlations p > our 
IBH as well as the BKY procedures yield for a given desired value of q, an FDR that is either less than q 
or exceeds it slightly. On the other hand the STS method produce, for p > 0, values of FDR that exceeds 
q by a large margin. The second aim is to assess the extent to which the value of V/R^ , obtained for a 
particular realization, will violate the bound, especially for the IBIf methods. 

As an overview we start by presenting in Fig. [3] the performance of our proposed IBHsum procedure 
for fixed m = 500 and q — 0.05, 0.2, and for a wide range of the parameters mo/m (fraction of alternative 
hypotheses) and pi (signal strength), by estimating the expected value FDR = E{V/R^) from our simu- 
lations. Fig. I3K and c are for the independent case and show both step-down and step-up results. As we 
can see, the two become identical when the signal (pi) is strong or when mo/m is small. Fig. |3l3 and d 
are for the positively dependent case (p — 0.8) for which the procedure is not proved to control the FDR. 
Indeed we can observe in Fig. [SJj violation of the FDR level q for large signals (pi); this violation of the 
bound for the dependent case will be discussed later. 

5.1 Comparison of several methods under dependency 

Here we fixed the signal parameter pi = 3.5, and varied mo/m between 0.2 and 1 (for m — 500). We 
present, in Fig. [4^, c and e results obtained for p = (complete independence) and in Fig. |4Jd, d and f for 
p = 0.8 (strong dependence). For each instance we applied the five procedures with q = 0.05. For STS we 
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Figure 3: Isolines of E{V/R'^), measured for the IBHsum procedure by simulations, presented in the 
{fix, mo/ m) plane. The solid lines are for the step-up procedure and the dashed lines for the step-down 
procedure, (a) and (c) are for the independent case {p — 0). (b) and (d) are for the positive dependency case 
p = 0.8. The FDR levels are q = 0.05 in (a),(b) and q = 0.2 in (c),(d). In (b) we find E{V/R+) > 0.05 for 
large p-i, in violation of the bound q = 0.05. The step-up and step-down procedures tend to coincide under 
dependency, while for independent p-values the step-down procedure is more conservative, especially for weak 
signal (small p,i). 

chose A = 0.5, and our IBHsum and IBHlog were employed in a step down manner. Fig. and b present 
for each method the mean value of V/R^ , as a function of mo/m. These means provide excellent estimates 
of E{V/R^), and they reveal that as expected, for p = all methods satisfy the bound E[V/R^) < q. 
The STS and IBH come closest to saturating the bound, with BKY slightly lower and BH95 significantly 
lower. The figures show also the result obtained by an "oracle", namely the procedure that uses the known 
value of mo in order to determine R' according to eqs. ([7} and ([8]). 

For p > no proved upper bound exists for either of the BKY, STS or IBH procedures. Furthermore, 
the proof of Q] for the BH95 procedure does not hold for two-tailed tests: indeed, as can be seen on Fig. 
HJj, the FDR obtained by the oracle procedure (slightly) violates the bound q = 0.05 for mo/m < 0.3, 
in agreement with the violation reported in [3]. Therefore it is important to assess the extent to which 
E{V/R^) obtained by each of these methods violates the bound q in the presence of positive correlations 
between the hypotheses. As seen in Fig. |4]3, for p = 0.8 the STS method produces a measured FDR that 
overshoots the value q = 0.05 of the bound by more than twice, for most of the range of mo values studied. 
In comparison, the other methods (BH95,BKY,IBHsum) provide FDR which remains below the bound or 
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exceeds it slightly for a narrow range of mo. The IBHlog procedure also violates the bound for nearly the 
entire range of mo/m, but by much less than STS. 
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Figure 4: Results obtained for synthetic data with m = 500 hypotheses; mo was varied, the FDR was set 
at g = 0.05, the mean of the distributions Pi was fii ~ 3.5 and the data were drawn either with covariance 
p ~ [(a), (c) and (e)] or p = 0.8 [(b), (d) and (f)]. Six methods were compared: oracle (ORC), BH95, 
BKY, STS and our two IBH procedures (in a step down manner), showing E{V/ R^) in (a) and (b), the power 
E{S)/mi in (c) and (d), and the standard deviation (st.d.) of V/R^ in (e) and (f), for the independent case 
and positively dependent cases, respectively. 

We conclude these comparisons between the different procedures by presenting, in Fig. and d their 
power, measured as the fraction of correctly rejected hypotheses, or 'True Discovery Rate'. For each 
realization we calculated S = R — V and plotted the ratio S/mi — {R— V)/{m — mo), averaged over all 
instances. This measure of power is one minus the type two error rate, known as the False Non-Discovery 
Rate T/mi (Q)- For the independent case p = the power of the ORC, BKY, STS and both IBH 
procedures are very close and much better than that of BH95. For p = 0.8 STS has the highest power, 
followed closely by the oracle, both IBH and BKY, with a large gap to BH95. Again, one should bear in 
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mind that STS has the largest number of discoveries 7?, at the cost of violating strongly the bound of 0.05 
on the FDR. Interestingly, there is no simple monotonicity relationship between the values of the FDR, 
J5(V/J?+), and the True Discovery Rate E{S/mi). 

Fig. shows the standard deviation (st.d.) of V/R'^ for the independent case, and Fig. [^T for the 
positively dependent case. As can be seen when the p- values are independent the st.d. is very similar for all 
the procedures, but increases steeply as mo/m 1. In the case of dependent p- values the situation becomes 
worse; for nearly the entire range of mo/m the coefficient of variance cv = at.d.{V / R^) / E{V / R^) is greater 
than 1. Also, as will be mentioned below, for real data the st.d. of the STS procedure is significantly higher 
than that of the IBH. These high values of st.d. result from the FDR definition, since the expectation of 
V/R^ takes into account many realizations with R — that give, by definition, V/R'^ = 0, making the 
distribution of V/R'^ very non-symmetric. A comparison similar to the one presented in Fig. |4]for q = 0.05 
is presented in Appendix |E] Fig. [TU]for q = 0.2, and provides similar observations. We thus conclude that 
for p = our IBH procedures provide, an expected improvement over the BH95 in therm of power and 
saturation of the bound and their performance is comparable to that of the other adaptive methods tested. 
For dependent variables STS violate the bound on E{V/R'^) much more than the IBHlog and the IBHsum 
which violate it only slightly. 

5.2 Applicability for a particular realization 

Controlling the FDR at a level q means that the average fraction of false rejections is no larger than 

q. It could still be the case that on average the fraction of false rejections is controlled, yet for a large 

percentage of the realizations one gets many false rejections and a high proportion of false discoveries. In 

contrast to the average behavior, captured by the FDR definition, questions involving the distribution of 

false rejections, affecting the behavior of a particular realization, were not studied much in the literature 
I ' 

(a notable exception is T7j who studied the variance of R). We therefore set out to address the issue 
of validity of the bound for a particular realization, by calculating for the synthetic data the probability 
Pr{^ < q)- This was done for q = 0.05 for the six procedures (ORG, BH95, BKY, STS, IBHsum and 
IBHlog, the latter two in step-down mode). The probability Pr{-^ < q) was estimated by computing, for 
each procedure, the fraction of realizations in which we indeed got < q. 

In such a comparison one should bear in mind that a conservative procedure, such as BH95, restricts 
the discoveries much more than a procedure that produces tight bounds (such as the oracle). For example, 
looking at Fig. we see that the mean value E{V/R'^) of BH95 is much lower than q — 0.05, and hence 
the weight of the tail of the distribution of V/R'^ values that "leaks" to V/R'^ > 0.05 is very small, whereas 
for the oracle, which has E{V/R'^) ~ 0.05, the probability of exceeding 0.05 is close to 0.5, and if we want 
to guarantee that Pr{V/R'^ < -B) ~ 1, we must set B at a value which is significantly larger than the 
FDR bound q. As seen in Fig. [S^, the results of IBH are slightly more conservative than the oracle in 
the case of independence, while all improved procedures have fairly similar results. In the case of strong 
dependency. Fig. the differences between the procedures are more pronounced; the STS is the most 
permissive procedure. 

It is very interesting to see that in the case of positive dependent statistics the probability to violate 
the bound is smaller, although E[V/R^] is larger. This is again due to the fact that in these cases we 
get R = for many realizations, which means that V/R'^ = 0, i.e. the variance of V/R'^ is increased for 
positive correlations, whereas for the independent case V/R'^ is very likely to be close to its expectation. 
Further study on the distribution of V/R'^ is required in order to shed light on the behavior of different 
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procedures for particular realizations. 

Fig. [SJ; and d present the cumulative distribution function (CDF) of V/R^ for specific set of param- 
eters, m = 1000, mo/m = 0.7, /ii = 3.5, g = 0.05, and the different procedures to be compared, for the 
independent case (Fig. [S};) and for the positive dependence case (Fig. [5]1). We would like to emphasize 
two points: 1. the CDF of our improved procedures have very similar behavior to the other improved 
procedures. 2. while in the independent case the distribution is close to symmetric, under dependency the 
distribution is very non-symmetric, and hence controling the mean (of V/R^) is almost irrelevant. 




Figure 5: (a) and (b) shows the probability that a single instance satisfies the desired FDR level g as a function 
of mo/m. Results are shown for simulated data with m = 1000 hypotheses, the mean of the distribution Pi 
was jii — 3.5, the FDR bound was set to q ~ 0.05. Five methods are compared: ORG, BKY, STS, and our 
two IBH procedures (in the step-down manner), (a) p = and (b) p ~ 0.8. The oscillatory behavior of some 
bounds is caused by finite size effects, (c) and (d) shows the cumulative distribution function of V/R^ for 
niQ/m = 0.7, {c) p — Q and (d) p = 0.8, (obtained from 10^ realizations). 



6 Application to gene expression data 

As an ultimate tests for their utility, we wanted to asses the performance of our new procedures on real life 
data, which typically provide complex and unexpected dependency structures which are hard to capture 
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in simulations. We therefore applied our procedures that were described in Sec. [3] to publicly available 
expression data. First we present in full detail how our procedures were applied to two datasets. Next, our 
procedures were applied to 33 datasets and results were compared with those obtained by several other 
procedures: the original BH95 and the improved bounds of BKY Q] and STS [2^ with A — 0.5. 

6.1 Detailed application of our procedures 

The first dataset used is that of 1] who studied several types of childhood leukemia. We focus here on 
search for genes whose expression separated 6 patients with normal bone marrow from 11 T-Cell Acute 
Lymphoblastic Leukemia patients, which yielded a large number of discoveries (differentiating genes). 
The number of hypotheses (e.g. potentially differentiating probesets) was m — 21288; the corresponding 
reported p- values were ordered and plotted on Fig. [T^. Our estimators for mo, obtained using eq. (|16|l 
and (|17[1 for this data, were mo = 7093, mo — 6380, and the estimated numbers of discoveries were 
m - mo ~ 14000, m - mo ~ 15000. 

The second study, of on breast cancer, had a relatively small number of discoveries. The aim was 
to find genes that differentiated early discovery breast cancer cases of poor and good outcomes, i.e. were 
difi'erentially expressed between tumors obtained from 38 subjects that died of the disease and from 121 
patients who were alive. The number of hypotheses was m = 44611, and our p- values based estimators for 
mo (plotted in Fig. [TJj) were mo — 38587, mo = 37580. 

For both studies we have set the desired FDR value at g = 0.1. We plot in Fig. [T]the sorted p-values 
versus i /m for these two datasets. In each of the two figures we show three FDR lines; the Oi of BH95 
(see eq. ^) and the values of 7^ corresponding to our two procedures, (see eq. 

For the first dataset the BH95 procedure yields at g = 0.1 a large number of R = 0.6065-21288 — 12912 
discoveries (see Fig. [T^). When we apply our procedure we get, at the same FDR, R' = 0.746 • 21288 — 
15884 (for the IBHsum) discoveries, i.e. 23% more. 

The BH95 procedure yields for the second dataset (at q = 0.1) R — 499 discoveries. When we apply 
our procedure we get, at the same FDR, R' — 621 (for the IBHsum) discoveries, i.e. 24% more. 

6.2 Applying our procedures to many datasets 



We downloaded from the ONCOMINE website [21( p- value vectors that were obtained from 33 comparisons, 
performed on expression data from 19 studies of various types of cancer: 

H. [25I. [2^. Iz^. [2a. [29I. [sol [31I1 . Depending on the biological question at hand, either one or two-tailed tests 
are appropriate. Therefore we applied our procedures to both test types. We focused on two opposing 
scenarios: those with a small number (less than 2% of m, for the BH95 procedure with q = 0.05) of 
discoveries, and those with a large number (more than 10% of m). The 33 sorted sets of pi values are 
plotted, versus i/m, in Fig. [6l separately for the four types of comparisons that were made (one/two-tailed 
test, low/high number of discoveries). 

As can be seen in Fig. |6l for each type of comparison the sorted p-value curve has a typical shape. 
In the case of a large number of discoveries. Fig. and c, the curve is more convex (and flatter near 
zero) than in the case of a small number of discoveries, Fig. [Sh and d. Another clear difference is between 
the two-tailed (Fig. [S^ and b) and the one-tailed (Fig. ^ and d) sorted p-value curves. In the case of 
two-tailed tests, the entire curve is convex, while for one-tailed tests the right side of the curve is concave; 
the reason is that in the latter case there are very often some hypotheses that are shifted, with respect to 
the null hypothesis, in the direction opposite to the one tested for by the one-sided test (for example, if 
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Figure 6: Sorted p-value vectors from 33 expression datasets of various cancer-related comparisons: (a) - two 
tailed tests with large numbers of discoveries, (b) - two tailed tests with small numbers of discoveries, (c) - 
one tailed tests with large numbers of discoveries, (d) - one tailed tests with small numbers of discoveries. 

one looks for up-regulated genes, there are typically also many down regulated genes, which produce very 
high p- values). For detailed treatment of FDR estimation in the case of one tailed tests see [19|. 

We compare here the performance of five procedures: the BH95,BKY,STS, IBHsum and IBHlog (both 
IBH in the step-down mode). For each of the improved procedures we determined the ratio between the 
number of rejected hypotheses it yielded and the number of hypotheses rejected by BH95. We present in 
Table [2] the mean value of this figure of merit and its standard deviation, calculated for the datasets of 
each of the types of comparisons mentioned above, at q — 0.05 and q = 0.1. 

Inspection of Table [2] reveals that for types (a),(b) - of two tailed tests, irrespective of the number of 
discoveries and FDR level, STS and both IBH procedures give significantly higher improvement over BH95 
than the BKY procedure, with STS performing slightly better than IBHlog, followed by IBHsum. For the 
one-tailed test with large numbers of discoveries (type (c)) the mean improvement of BKY is the highest 
while STS and IBHsum are quite similar. IBHlog fails dramatically in this case due to the abundance of 
p- values close to one, giving an over-estimation of mo. For type (d), one tailed tests with a small number 
of discoveries, IBHsum is slightly better than STS and both yield a significantly higher improvement than 
BKY. It should be noted that in all four types and for all values of FDR, the standard deviations of V/R'^ 
of the STS method are significantly higher than those of BKY and the IBHsum procedures. Furthermore, 
as shown in Sec. 15.11 (see Fig. HJa), in the case of positively dependent test statistics the STS procedure 
loses control of the FDR in a much more drastic manner than our IBH procedures. Since we expect that 
correlations between the expression profiles of different genes will be present in most data, the STS method 
may produce unreliable values of the figure of merit presented here. 

In summary, our IBH procedures constitute in all cases a significant improvement over the original 
BH95; in all but one of the comparison types the improvement is significantly better than that of the BKY 
method. Comparison with STS yields mixed results, but the edge of STS over IBH in two of the four 
comparison types is overshadowed by the fact that STS does not provide a reliable bound for datasets with 
positive correlations between probe sets, while IBH remains reliable. 
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q 


BKY 


STS 


IBHsum 


IBHlog 


a. Two tailed, large number of discoveries (10 studies) 


0.05 


1.110 
(0.043) 


1.239 
(0.138) 


1.200 
(0.110) 


1.222 
(0.130) 


0.1 


1.155 


1.258 


1.213 


1.237 


(0.057) 


(0.117) 


(0.087) 


(0.102) 


b. Two tailed, small number of discoveries (10 studies) 


0.05 


1.003 
(0.003) 


1.316 
(0.197) 


1.231 
(0.140) 


1.291 
(0.179) 


0.1 


1.017 


1.308 


1.230 


1.275 


(0.027) 


(0.161) 


(0.117) 


(0.137) 


c. One tailed, large number of discoveries (8 studies) 


0.05 


1.049 
(0.019) 


1.011 
(0.033) 


1.014 
(0.026) 


0.108 
(0.306) 


0.1 


1.062 


1.012 


1.014 


0.108 


(0.026) 


(0.0340 


(0.024) 


(0.305) 


d. One tailed, small number of discoveries (5 studies) 


0.05 


0.998 
(0.020) 


1.027 
(0.052) 


1.025 
(0.017) 


0.882 
(0.123) 


0.1 


1.004 


1.028 


1.031 


0.888 


(0.031) 


(0.079) 


(0.022) 


(0.120) 



Table 2: Comparison of the improvement in power (ratio between numbers of rejected hypotheses with respect 
to the BH95 procedure: V/VBmz) of several methods: BKY [e!], STS 24 1 IBHsum and IBHlog in the step- 
down version. Mean values and standard deviations (in parentheses) are given for each of the four types of 
comparisons. 



7 Discussion 

We addressed the problem of controlling the False Discovery Rate in the case of a large number of compar- 
isons, or hypotheses to be tested simultaneously. Providing a reliable and possibly tight bound on the FDR 
is an issue of major importance for analysis of high-throughput biological data, such as obtained using 
gene expression microarrays. We presented here two estimators of mo, the number of true null hypotheses. 
We proved that both estimators can be used for FDR estimation and, more importantly, for FDR control. 
Thus, we added two procedures to the rather limited repertoire of improved FDR procedures for which 
control of the FDR is known to hold. Our proof of control relies on a general theorem, which provides 
a bound on the FDR for improved procedure using any estimator mo(pi, ...pm) provided a condition of 
monotonicity is satisfied, and one is able to bound the reciprocal mean of the estimator. In addition, we 
proved a novel result, that FDR procedures satisfy a monotonicity property under some very plausible 
assumptions. As a corollary of this theorem, we show that any bound on the FDR that was proved for the 
step up procedure, holds also for the more conservative step down procedure as well. It should be noted 
that our proofs of control hold only for the independent case. For the dependent case, results for control are 
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even more scarce, and limited to certain specific types of dependency. We therefore studied the behavior 
of our procedures, compared to others known from the literature, under dependency, using simulations. 
In addition to studying behavior under dependency, our simulations also enabled us to understand the 
distribution of the fraction of false hypothesis, and in particular the probability of violating the bound for 
a particular given realization. Further research on this aspect of comparing procedures is needed and we 
expect it to provide interesting new insights and measures for comparisons of different procedures. 

We finally applied our procedures, as well as several others, to a large number of cancer-related expres- 
sion datasets. For both real and simulated data, our new procedures provided more rejections (separating 
genes) than the similar list of Benjamini and Hochberg and the very recently introduced irnproved bound 
of BKY Q], for a fixed desired value of the FDR. In some cases the improved bound of STS .24;] gives more 
rejection than our method, but as we have shown on synthetic data, when there are positive correlations, 
STS loses control of the FDR in a much more pronounced way than our procedure. To summarize: a 
researcher may either obtain a desired number of differentially expressed genes at a lower FDR, or get a 
longer list of such genes at the desired FDR level, at no added computational cost. We recommend using 
our IBHlog procedure for two-tailed tests, and IBHsum procedure for one-tailed test, to increase discovery 
power while controlling FDR levels. 
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A Proof of theorem [T]: 

Proof 1 Assume w.l.o.g. that the first mo hypothesis are null, and the next m — mo are the alternative. 
As is hinted from the bound, the key idea used in the proof is to consider a modified (and more liberal) 
estimator nSff' , which does not take into account pi. We also define the following modified qt: gl^' = -^hF)- 
From the monotonicity of rho we have: 

m^/> < mo, > qk (20) 



In addition, we define the following events: 

Cf' = {max{j:p[]Li)<*} = fc} 

y cW = {pWi)>ft, Vj = fc + l,..,m} (21) 

]<k 

where p'Jj < ■. < P(^-i) '^'"^ ordered m — 1 p-values excluding p\. 

Since C^i^^ and rhff^ depend only on p2, ■■,Pm, the following conditional independence relation holds: 

C'i^ ALp,\m^^ (22) 

Before giving our proof, we need the following two lemmas: 
Lemma 1 If pi is independent of p2..m then: 

Pr(7?«|{m«,pi<g«})<Pr(75W|{4M,pi<gii'}), V j < fc (23) 

Proof (lemma 1): 

The lemma's statement involves conditioning on pi < p. We first prove a point-wise auxiliary claim, 
conditioning on pi = p. Let g(p2, ■■,Pm\iTi'l^) be the conditional density function ofp2,..,Pm given . 
Denote /p,._(p) = Pr{D^^^^\{rh^/> ,p^ = p}). We prove below that f is monotonically non- decreasing. Let 
p < p' . Then: 

m 

fp...Ap) = Pr(Z)(i'|{m«,pi =ri) = Pr{ f| {p« i) > qA\{^lf ,Pi = P}) = 

j=k+i 



/ 



Pr{ n iPu-i)>~~~r^ z}\{pi^P,P2..,n})giP2,.;Pm\m'-/>)dp2..dp^< 

jlfc + l mo(p,P2..m) 

m 

Pr{ n iPu-1) > ^~~rr s}\{Pi=p',P2--m})giP2,-,Pm\m[f')dp2..dp„,,= 

jikli mo{p',P2..m) 

m 



Pr{ n M'-i) > =p'})=Pr{d^^\{m^^,p, = p'}) ^ f,,._Ap') (24) 



■ til "^0 
] = k + l 



Thus f IS a monotonically non- decreasing function. Therefore, the average of f over [0,p] is also 
non- decreasing in p: 
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P Jo P P P Jo 

A r fP2..A^)dx + {i-^)u,,„Ap)< 

P Jo P 

I fp I fp' I fp' 

— fp2..^{x)dx + — fp2..Ax)dx= — I .fp2._mix)dx 
P Jo P Jp P Jo 

And this in fact shows: 

Pr{Di'^\{7h^/^,Pi<p}) < Pr{Di}^\{m^^\pi<p'}), Wp < p' 



(25) 



(26) 



From here the lemma's claim follows by simple integration, noting that any can be treated as a 
constant given p2, ..,Pm-' 

(1) 

Pr{D['^\{m^/>,p^ < gf )}) = f p(p2, ..,p„|m<M) J^^ P (x)dpidp2..dp™ < 



I 

^ P2 



g{p2,-,Pm\ml^)-lYT I fp2 ^{x)dpidp2..dp„, = Pr{Dl!'\{ml/' ,pi < q)!^'}) 



.{1)1 



o/Q/. 



The next lemma follows the spirit 
Lemma 2 

k 

^Pr(Cj^^|{m(W,pi < q,}) < Pr(Z)«l{m«,pi < q,}) Vfc = 1, ..,m 
Proof (lemma 2): 

The proof is done by induction. For k = 1, eq. i2S\) is reduced to: 



(27) 



(28) 



Pr{C['>\{rhl'^,p^ < gi}) < Pr{D^^>\{m[/> ,p^ < gi}) 

And the two quantities are equal, since by definition Cj^' — D^"*^' . Assuming the correctness of eq. 
for k, we prove it for k + 1 using lemma\^ 
fe+i 

Y,Pr{Cf'^\{rh\^,pr < g,}) < Pr{Di'^\{m[^ ,p^ < qk}) + 
i=i 

Pr(C<^^J{m«,pi < gfe+i}) < Pr(D«|{mW,pi < q,+,}) + 
Pr{Ci'l,\{mi^,p^ < g,+i}) = Pr{Dl'l\{ml/^ ,p^ < g,+i}) 



(29) 



(30) 



By using eqs. 



E 



V 



R+ 



and lemma we are able to express the FDR as: 
^ Pr{R = k)E[-\R ^k]^Y. jP^iR = fc) Pr{p^ < qu\R ^ k) 



mo 



J2-Pr{R = k,pi < gfc) < mo^-Pr{R = k,pi < 
fe=i fe=i 



moj:lPr{Ci'\p, < gi^>) = mo ^ ^ / Pricf\p^ < gl^ V^*^)4<« dm« 

k=l k^l -^^0 " 

''O m-o fe^i " 
Jra\y> ml*' 

moqE 



mo 



(31) 
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B Designing the IBHsum estimator 



There are many ways to design the estimator rho such tliat it will satisfy Thm. [T] Here we choose to define 
mo using m'o as: 



mo — C(m) ■ min 



, {max{s{m),2 pj] 



(32) 



Our goal is to calculate the optimal correction factors C'{m) and s(m) such that mo will still satisfy eq. 
(|13p . note that setting C — 1 and s = gives the (uncorrected) unbiased estimator rhg. We can bound 
£■[— Iyj] by neglecting the alternative p- values: 



C(m) 



lin l^m, max(s(m), 2 J^^JLj 



dm] 



■E 



min \m,max{s(rn),2Y^^j^22Pj)^ 



Define the r.v. Zm^^ = ^Y^^^iPi ^'^'^ denote its density by /i'™''-'(zmo)- Then:s 



E 



-j=2 

1 

C(m) 



s Jo 



dt + — I h'-r°\t)dt 



(33) 



(34) 



We want to find an optimal pair (C, s) satisfying the above inequality. First, we assume a given value of s 
and find the optimal (smallest possible) C. Had we known mo, and since we want £;[l/m[,'^] < l/mo we 
would have chosen C to be: 



C{m, mo, s) = mo 



1 rs rm ui^o) {-f-\ 1 /'2m 



(35) 



Since in the above equation mo is unknown we must maximize over all of its possible values C(m, s) = 
maxC(m, mo, s). We are now left with the choice of s. Notice that as we increase s from zero, the maximal 

mo 

C decreases but at some point it remain the same, since when mo = m our bound for C is independent of 
s - we set this point as the optimal s, s(m) = min{s : s — argmin[C{m, s')]}. Fig. [7]presents an example 

s'6[0,m] 

for the dependency of C on the mo and s. 



m=500, S=m/20 

m=500, S=98 

m=1000, S=147 



J 

0.2 0.4 0.6 0.8 1 

m„/m 

Figure 7: An example of the dependency of C{m,mQ, s) on mp for different values of s. As can be seen s 
affects the location and the size of the left maximum, whereas the left maximum (at mo — m) is independent 
of s. We choose s such that the maximum of C is the smallest possible, and take the minimal s which achieves 
this. This gives s = 98 for m = 500 and s = 147 for m = 1000. 

Using numerical integration we calculate C(mo) for fixed m, its behavior for different values of m and 
s is presented in Fig. [T] As can be seen s controls the location of the left maximum and we choose s to 



1.04 
1.03 

C 

1.02 
1.01 



23 



be such that the majdmal C is minimal, this happen when the left maximum is the same as the value at 
ma/m = 1. The resulting s(m),C(m) are presented in Fig. and b. In the above formula for C(m,mo) 
the density /li™"' (z), is the density of the uniform sum distribution. Since calculation of the above integrals 
with the exact uniform sum distribution cause numerical difflculties we approximated it by a Gaussian 
distribution. For large values of m, the approximation converges to the exact distribution according to the 
central limit theorem. For small values of m (m < 40), we were able to compare the C{m) calculated by 
the exact uniform-sum distribution with the C{m) which were calculated by the Gaussian approximation. 
This comparison shows that Cappr oximate{m) > Cexactim), and that the rate of convergence is faster that 
see Fig. [8l thus calculating C{m) using the normal approximation is conservative (gives higher C) 
and as expected converge to the values of C calculated by the exact distribution. 



0.2 




,1 , i , , i , 1 

5 10 15 20 25 30 35 40 

m 



Figure 8: Difference between C(m) calculated using the normal approximation and the C(m) calculated by 
the exact uniform sum distribution. Note that the the sign is positive (means that the approximation is 
conservative) and the rate of convergence is faster than 1/m. 



C Proof of claim [T] 

Proof (claim 1): 

The proof is accomplished by bounding E[l/rn'(p] and using Thm. QJ 

For c > 0, the function 4>{x) = l/{x -\- c) is convex. Therefore, we can use Jensen's inequality with this 
function and get: 

1 1 



i^nM^] -^[ 2-Er.,iog(i-,J ^ 



2 - e:" 2 iog(i - p.)\ ^----'----^ 2 - YZ2 iog(i - P.) 

^'^-^"'-^'^'"-Ei-iogii-p,) iog(i -po] - Er=2iog(i -po - 



< 



'P2..™ -EIlolog(l -P»)J ^TJHo^ogil - Pi) 

^ f v-^o = ^ 1 (36) 

i-E»J?,iog(i -pO-I ^Y.^Jo^Ogp,\ 

where po ~ U[0, 1] is an auxiliary random variable defined to be independent of pi,,m,- 
Define Ymg ~ Il^fgPi. Since po..mo i.i.d. U[0,1], Ymo has the following density function: 

hY^„{t) = ^-^{logtr" (37) 
mo! 

Define also Xmo = —2 log YmQ , then Hx„^„ (t) = 1 — Hym^ (e"*''^) and 



hx^ (t) = Hy^ (£-'/') = ——^ — - (38) 
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Therefore Xmo o- chi-square r.v., Xm„ ~ 'x^{2m(,) + 2). Using this fact, we get 

E\l/m f'] < E\2/X^„] = / — - ■ -dt^ — / , -dt = — (39) 

^' ^' °^ Jo 2-omo! t mo Jo 2^0(^0-1)! mo ^ ' 

Therefore, according to Thm. QJ we immediately get: 

FDR < moq— = q (40) 
mo 



D Proof of monotonicity theorem 

Proof 2 Assume w.l.o.g. that the first mo hypothesis are null, and the next m — mo are the alternative. 
For each procedure, R is some function of p = (pi, ..,pm) which depends only on the order statistics 
Pq — (p(i), ..,P(m))- We therefore need to prove: 

fp^...AP)dP^^ < I .U^...AP)dp^^ (41) 

RJip) Jp RJip) 
Note that the Vi 's depend on the exact realization p while the Ri 's depend only on the order statistics 

Pq, and thus we can write equivalently: 

,^ ... E[V^{p)\pq] ^ f ,^ , E[V2{p)\po] 

Where g is the joint density of the order statistics pq , given by: 

5(P())= E fpi.-.,A^~'iP0)) (43) 

That is, g is obtained by summing over all m! permutations a on m elements in the symmetric group 
Sm, each permutation a transferring different configuration of the pi 's into the same order statistics vector 
Pq — a'(p) = (pcti, ■■,Pcr^), and thus p is given by applying the inverse permutation to pq. Note that 
under the assumption that the p-value are independent and the null p-values are U[0, 1], g can be written 
as: 

m 

S(P{))= E /fi..-™('^"'(P0))= E [ n /(^K-)] (44) 

In order to show that the inequality J^^P holds for the integral, it is enough to show it for each realization 
of the order statistics pq . Thus, we want to show: 

mmm < E[V2iPi\Po] (45) 

RtiPo) ~ RtiPo) " 

Or: 

E[Vi\Po] < $^E[V2\Po], VPO (46) 
It is enough to show the claim for the case Ri{f}} — k,R2{p) = fc + 1 for some < fc < m — 1, and 
then it will follow by induction for the case R2{p) — Ri{p) > 1 ■ Define the r.v. Xj{p) = l{pyj„uH}, i-e- 
the indicator for the event that the j-th order statistic is null. 

aj < mo ^ ^ 

(47) 

aj > mo 




where a here is the permutation transferring p to pq. We thus need to prove: 

k , k+1 , 

i5[y|fc;P0] =E^[^^(P^iP{)] < ;fc^E^[^^(P^IP()] = (48) 



J=l 3=1 
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It is therefore enough to prove 

E[xjip)\po] < E[xk+i{p)\po], Vj < fc + 1 (49) 

or in other words, that E[xj\pQ] is monotonically non- decreasing in j. We will show that E[xk\p()] < 
E[xk+i\p()] and then the claim follows again by induction. 

E[xkip)\Po]^ J2 f'-(p = '7~^(P())|P())a;fc((T~^(p())) = 1. J2 f{'^~^(Po))^>={(T'^{Po)) = 

1 ^ 

= 27^ E [ n /(Pk-))]1w<"^o} (50) 

Where Z[pq) is a normalization constant depending on the order statistics, and we have used the 
independence of all p-values. For each permutation on m elements a , we define a' to be the permutation 
identical to a, except that and (Jfe+i are swapped, i.e. cr^. = o"fe+i,(T^^i = ah- Then we can write: 

E[xk\p{)\ - E[xk+i\p{)\ = 

E U n /(pk-1))1iw<'"o} - [ n /(pk-'))]^k+i<™o} f (51) 

The usage of the swapped permutation in the above sum makes the value of the two indicators identical, 
and thus we sum only over permutations a such that Ok < ttiq, i.e. when p(^if) is null. In the case where 
P(k+i) is ciiso null (i.e. ak+i < rno) the difference is zero and we can omit this case from the sum, while 
in the case where P(k+\) is alternative (ak+i > mo) one element in the product is different and we get: 



E[x,Wo] - £[x,+ilpo] = ^ E M^.<^o<^.+.} \ fl /(P(.ri))l [l - TT^^l < (52) 
where the last inequality follows from the monotonicity assumption on f{p). 



E Simulations study details 

A simulation study was done in order to determine the performance of the proposed procedure and compare it 
to existing procedures. We generated multivariate Gaussian random variables (and corresponding p-values) 
in similar to 'H] and previous works. First randomize a vector of i.i.d. r.v.s. Y\, , , Ym+i ~ A'^(0, 1); then, 
given the parameters m,mo,fJ,i and p, build the vector X\,,,Xm (which is the test statistics vector) as 
follows: the first mo elements are Xi — y/pYm+i + \/l — pYi, and the remaining m — mo elements are 
■^i ~ ^/pYm+\ + — pYi + p,\. Here mo/m is the fraction of true hypotheses, p is a dependency factor 
(the correlation coefficient between Xi and Xj for i ^ j), and /ii is the mean of the false hypotheses test 
statistics (the signal intensity). The resulting vector X is such that its first mo variables come from the 
N{0, 1) distribution, and the remaining m — mo variables come from N{pi, 1) distribution, where for any 
Xi and Xj (either both, one or none of them are null) their correlation coefficient ts p. The p-values were 
calculated using 2 tailed z-test (p — 2^[—\x\)). The number of simulations for each case was 50000, which 
provided highly accurate and reproducible results. Since the simulation results depend on several parameters, 
mo/m, fii, p,m, we have chosen to vary two parameters at a time, and present the results using isolines of 
the actual FDR (or any other quantity). These isoline plotted in Figs. and[^ describe the performance 
of the IBHsum and IBHlog procedures, respectively, on simulated data in the {mo/m, p,i) plane. 
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Figure 9: Isolines of E{y/R^), measured for the IBHlog procedure by simulations, presented in the {iJ,i,mo/m) 
plane. The solid lines are for the step-up procedure and the dashed lines for the step-down procedure, (a) and 
(c) are for the independent case (p = 0). (b) and (d) are for the positive dependency case p = 0.8. The FDR 
levels are q = 0.05 in (a),(b) and q = 0.2 in (c),(d). In (b) we find EiV/R^) > 0.05 for large /xi, in violation 
of the bound q = 0.05. In similar to the behavior for IBHsum, the step-up and step-down procedures tend 
to coincide under dependency, while for independent p-values the step-down procedure is more conservative, 
especially for weak signal (small /Ui). 
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Figure 10: Results obtained for synthetic data with m = 500 hypotheses; toq was varied, the FDR was set 
at q = 0.2, the mean of the distributions Pi was fii ~ 3.5 and the data were drawn either with covariance 
p = [(a), (c) and (e)] or p = 0.8 [(b), (d) and (f)]. Six methods were compared: oracle (ORC), BH95, 
BKY, STS and our two IBH procedures (in a step down manner), showing E{V/ R'^) in (a) and (b), the power 
E{S)/mi in (c) and (d), and the standard deviation (st.d.) of V/R^ in (e) and (f), for the independent case 
and positively dependent cases, respectively. 
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